In Class Exercise 5

Geographically Weighted Logistic Regression (GWLR) and Application

Overview

In Class Exercise 5

Getting Started

Installing R packages

We will install he following R packages for this In-Class Exercise:

sf, tidyverse, funModeling, blorr, corrplot, ggpubr, spdep, GWmodel, tmap, skimr, caret, report

pacman::p_load(sf, tidyverse, funModeling, blorr, corrplot, ggpubr, spdep, GWmodel, tmap, skimr, caret, report)

Importing Data

First, we are going to import the water point data in R environment.

Osun <- read_rds("rds/Osun.rds")
Osun_wp_sf <- read_rds("rds/Osun_wp_sf.rds")

Viewing Functional & Non-functional

We can view the percentage of functional water points and the percentage of non functional water points in Osun using the code chunk below.

Osun_wp_sf %>%
  freq(input = 'status')
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.

  status frequency percentage cumulative_perc
1   TRUE      2642       55.5            55.5
2  FALSE      2118       44.5           100.0

We can see that 44.5% of the water points in Osun are non-functional.

Plotting the water points

The below plot allows us to see the locations of the functional and the non-functional water points plotted on the map of Osun.

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(Osun) +
# tmap_option
  tm_polygons(alpha = 0.4) +
  tm_shape(Osun_wp_sf) +
  tm_dots(col = "status", alpha = 0.6) +
  tm_view(set.zoom.limits = c(9, 12))

Data Wrangling

The below code chunk using the skim() function allows us to see which variables have missing values and we should exclude them from our analysis.

Osun_wp_sf %>%
  skim()
Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
Data summary
Name Piped data
Number of rows 4760
Number of columns 75
_______________________
Column type frequency:
character 47
logical 5
numeric 23
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
source 0 1.00 5 44 0 2 0
report_date 0 1.00 22 22 0 42 0
status_id 0 1.00 2 7 0 3 0
water_source_clean 0 1.00 8 22 0 3 0
water_source_category 0 1.00 4 6 0 2 0
water_tech_clean 24 0.99 9 23 0 3 0
water_tech_category 24 0.99 9 15 0 2 0
facility_type 0 1.00 8 8 0 1 0
clean_country_name 0 1.00 7 7 0 1 0
clean_adm1 0 1.00 3 5 0 5 0
clean_adm2 0 1.00 3 14 0 35 0
clean_adm3 4760 0.00 NA NA 0 0 0
clean_adm4 4760 0.00 NA NA 0 0 0
installer 4760 0.00 NA NA 0 0 0
management_clean 1573 0.67 5 37 0 7 0
status_clean 0 1.00 9 32 0 7 0
pay 0 1.00 2 39 0 7 0
fecal_coliform_presence 4760 0.00 NA NA 0 0 0
subjective_quality 0 1.00 18 20 0 4 0
activity_id 4757 0.00 36 36 0 3 0
scheme_id 4760 0.00 NA NA 0 0 0
wpdx_id 0 1.00 12 12 0 4760 0
notes 0 1.00 2 96 0 3502 0
orig_lnk 4757 0.00 84 84 0 1 0
photo_lnk 41 0.99 84 84 0 4719 0
country_id 0 1.00 2 2 0 1 0
data_lnk 0 1.00 79 96 0 2 0
water_point_history 0 1.00 142 834 0 4750 0
clean_country_id 0 1.00 3 3 0 1 0
country_name 0 1.00 7 7 0 1 0
water_source 0 1.00 8 30 0 4 0
water_tech 0 1.00 5 37 0 20 0
adm2 0 1.00 3 14 0 33 0
adm3 4760 0.00 NA NA 0 0 0
management 1573 0.67 5 47 0 7 0
adm1 0 1.00 4 5 0 4 0
New Georeferenced Column 0 1.00 16 35 0 4760 0
lat_lon_deg 0 1.00 13 32 0 4760 0
public_data_source 0 1.00 84 102 0 2 0
converted 0 1.00 53 53 0 1 0
created_timestamp 0 1.00 22 22 0 2 0
updated_timestamp 0 1.00 22 22 0 2 0
Geometry 0 1.00 33 37 0 4760 0
ADM2_EN 0 1.00 3 14 0 30 0
ADM2_PCODE 0 1.00 8 8 0 30 0
ADM1_EN 0 1.00 4 4 0 1 0
ADM1_PCODE 0 1.00 5 5 0 1 0

Variable type: logical

skim_variable n_missing complete_rate mean count
rehab_year 4760 0 NaN :
rehabilitator 4760 0 NaN :
is_urban 0 1 0.39 FAL: 2884, TRU: 1876
latest_record 0 1 1.00 TRU: 4760
status 0 1 0.56 TRU: 2642, FAL: 2118

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
row_id 0 1.00 68550.48 10216.94 49601.00 66874.75 68244.50 69562.25 471319.00 ▇▁▁▁▁
lat_deg 0 1.00 7.68 0.22 7.06 7.51 7.71 7.88 8.06 ▁▂▇▇▇
lon_deg 0 1.00 4.54 0.21 4.08 4.36 4.56 4.71 5.06 ▃▆▇▇▂
install_year 1144 0.76 2008.63 6.04 1917.00 2006.00 2010.00 2013.00 2015.00 ▁▁▁▁▇
fecal_coliform_value 4760 0.00 NaN NA NA NA NA NA NA
distance_to_primary_road 0 1.00 5021.53 5648.34 0.01 719.36 2972.78 7314.73 26909.86 ▇▂▁▁▁
distance_to_secondary_road 0 1.00 3750.47 3938.63 0.15 460.90 2554.25 5791.94 19559.48 ▇▃▁▁▁
distance_to_tertiary_road 0 1.00 1259.28 1680.04 0.02 121.25 521.77 1834.42 10966.27 ▇▂▁▁▁
distance_to_city 0 1.00 16663.99 10960.82 53.05 7930.75 15030.41 24255.75 47934.34 ▇▇▆▃▁
distance_to_town 0 1.00 16726.59 12452.65 30.00 6876.92 12204.53 27739.46 44020.64 ▇▅▃▃▂
rehab_priority 2654 0.44 489.33 1658.81 0.00 7.00 91.50 376.25 29697.00 ▇▁▁▁▁
water_point_population 4 1.00 513.58 1458.92 0.00 14.00 119.00 433.25 29697.00 ▇▁▁▁▁
local_population_1km 4 1.00 2727.16 4189.46 0.00 176.00 1032.00 3717.00 36118.00 ▇▁▁▁▁
crucialness_score 798 0.83 0.26 0.28 0.00 0.07 0.15 0.35 1.00 ▇▃▁▁▁
pressure_score 798 0.83 1.46 4.16 0.00 0.12 0.41 1.24 93.69 ▇▁▁▁▁
usage_capacity 0 1.00 560.74 338.46 300.00 300.00 300.00 1000.00 1000.00 ▇▁▁▁▅
days_since_report 0 1.00 2692.69 41.92 1483.00 2688.00 2693.00 2700.00 4645.00 ▁▇▁▁▁
staleness_score 0 1.00 42.80 0.58 23.13 42.70 42.79 42.86 62.66 ▁▁▇▁▁
location_id 0 1.00 235865.49 6657.60 23741.00 230638.75 236199.50 240061.25 267454.00 ▁▁▁▁▇
cluster_size 0 1.00 1.05 0.25 1.00 1.00 1.00 1.00 4.00 ▇▁▁▁▁
lat_deg_original 4760 0.00 NaN NA NA NA NA NA NA
lon_deg_original 4760 0.00 NaN NA NA NA NA NA NA
count 0 1.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁

Creating new Data Frame

The code chunk below creates a new sf data frame dropping the variables with missing values.

Osun_wp_sf_clean <- Osun_wp_sf %>%
  filter_at(vars(status, distance_to_primary_road, distance_to_secondary_road, distance_to_tertiary_road, distance_to_city, distance_to_town, water_point_population, local_population_1km, usage_capacity, is_urban, water_source_clean), all_vars(!is.na(.))) %>%
              mutate(usage_capacity = as.factor(usage_capacity))

Things to learn from code chunk above. We have dropped the variables that have missing values and it will irrelevant if we have included them because we will be losing data points that we need to analyse if we have included variables with missing values.

Osun_wp <- Osun_wp_sf_clean %>%
  select(c(7, 35:39, 42:43, 46:47, 57)) %>%
  st_set_geometry(NULL)

Correlation Matrix

The code chunk below will plot the correlation matrix for variables from columns 2 to 7 of the osun_wp data frame, which are the quantitative data. This is to check for multicollinearity between the independent variables.

cluster_vars.cor = cor(Osun_wp[, 2:7])
corrplot.mixed(cluster_vars.cor, lower = "ellipse", upper = "number", tl.pos = "lt", diag = "l", tl.col = "black")

Logistic Regression

The below code chunk constructs the model for the logistic regression for our sf data frame, osun_wp_sf_clean.

model <- glm(status ~ distance_to_primary_road + distance_to_secondary_road + distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban + usage_capacity + water_source_clean + water_point_population + local_population_1km, data = Osun_wp_sf_clean, family = binomial(link = 'logit'))

Bayesian Linear Regression

Instead of using typical R report, we use the blr_regress of blorr package is used to check the model overview and observe the maximum likelihood estimates and the respective p-values to check if they are significant.

blr_regress(model)
                             Model Overview                              
------------------------------------------------------------------------
Data Set    Resp Var    Obs.    Df. Model    Df. Residual    Convergence 
------------------------------------------------------------------------
  data       status     4756      4755           4744           TRUE     
------------------------------------------------------------------------

                    Response Summary                     
--------------------------------------------------------
Outcome        Frequency        Outcome        Frequency 
--------------------------------------------------------
   0             2114              1             2642    
--------------------------------------------------------

                                 Maximum Likelihood Estimates                                   
-----------------------------------------------------------------------------------------------
               Parameter                    DF    Estimate    Std. Error    z value     Pr(>|z|) 
-----------------------------------------------------------------------------------------------
              (Intercept)                   1      0.3887        0.1124      3.4588       5e-04 
        distance_to_primary_road            1      0.0000        0.0000     -0.7153      0.4744 
       distance_to_secondary_road           1      0.0000        0.0000     -0.5530      0.5802 
       distance_to_tertiary_road            1      1e-04         0.0000      4.6708      0.0000 
            distance_to_city                1      0.0000        0.0000     -4.7574      0.0000 
            distance_to_town                1      0.0000        0.0000     -4.9170      0.0000 
              is_urbanTRUE                  1     -0.2971        0.0819     -3.6294       3e-04 
           usage_capacity1000               1     -0.6230        0.0697     -8.9366      0.0000 
water_source_cleanProtected Shallow Well    1      0.5040        0.0857      5.8783      0.0000 
   water_source_cleanProtected Spring       1      1.2882        0.4388      2.9359      0.0033 
         water_point_population             1      -5e-04        0.0000    -11.3686      0.0000 
          local_population_1km              1      3e-04         0.0000     19.2953      0.0000 
-----------------------------------------------------------------------------------------------

 Association of Predicted Probabilities and Observed Responses  
---------------------------------------------------------------
% Concordant          0.7347          Somers' D        0.4693   
% Discordant          0.2653          Gamma            0.4693   
% Tied                0.0000          Tau-a            0.2318   
Pairs                5585188          c                0.7347   
---------------------------------------------------------------

Now we view the report of the model with the code chunk below.

report(model)
We fitted a logistic model (estimated using ML) to predict status with
distance_to_primary_road (formula: status ~ distance_to_primary_road +
distance_to_secondary_road + distance_to_tertiary_road + distance_to_city +
distance_to_town + is_urban + usage_capacity + water_source_clean +
water_point_population + local_population_1km). The model's explanatory power
is moderate (Tjur's R2 = 0.16). The model's intercept, corresponding to
distance_to_primary_road = 0, is at 0.39 (95% CI [0.17, 0.61], p < .001).
Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_secondary_road
(formula: status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_secondary_road = 0,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_tertiary_road (formula:
status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_tertiary_road = 0,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_city (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_city = 0, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_town (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_town = 0, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with is_urban (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to is_urban = [?], is at 0.39 (95%
CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with usage_capacity (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to usage_capacity = 300, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_source_clean (formula: status
~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_source_clean = Borehole,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_point_population (formula:
status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_point_population = 0, is
at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation. and We fitted a logistic
model (estimated using ML) to predict status with local_population_1km
(formula: status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to local_population_1km = 0, is at
0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.

We can see from the above report that the following variables,

  1. distance_to_primary_road

  2. distance_to_secondary_road

are insignificant and we should consider excluding them in the further analysis. We create a new “model2” to exclude the 2 insignificant variables.

Creating New Model

The code chunk below creates a new model without the 2 insignificant variables

model2 <- glm(status ~ distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban + usage_capacity + water_source_clean + water_point_population + local_population_1km, data = Osun_wp_sf_clean, family = binomial(link = 'logit'))
blr_regress(model2)
                             Model Overview                              
------------------------------------------------------------------------
Data Set    Resp Var    Obs.    Df. Model    Df. Residual    Convergence 
------------------------------------------------------------------------
  data       status     4756      4755           4746           TRUE     
------------------------------------------------------------------------

                    Response Summary                     
--------------------------------------------------------
Outcome        Frequency        Outcome        Frequency 
--------------------------------------------------------
   0             2114              1             2642    
--------------------------------------------------------

                                 Maximum Likelihood Estimates                                   
-----------------------------------------------------------------------------------------------
               Parameter                    DF    Estimate    Std. Error    z value     Pr(>|z|) 
-----------------------------------------------------------------------------------------------
              (Intercept)                   1      0.3540        0.1055      3.3541       8e-04 
       distance_to_tertiary_road            1      1e-04         0.0000      4.9096      0.0000 
            distance_to_city                1      0.0000        0.0000     -5.2022      0.0000 
            distance_to_town                1      0.0000        0.0000     -5.4660      0.0000 
              is_urbanTRUE                  1     -0.2667        0.0747     -3.5690       4e-04 
           usage_capacity1000               1     -0.6206        0.0697     -8.9081      0.0000 
water_source_cleanProtected Shallow Well    1      0.4947        0.0850      5.8228      0.0000 
   water_source_cleanProtected Spring       1      1.2790        0.4384      2.9174      0.0035 
         water_point_population             1      -5e-04        0.0000    -11.3902      0.0000 
          local_population_1km              1      3e-04         0.0000     19.4069      0.0000 
-----------------------------------------------------------------------------------------------

 Association of Predicted Probabilities and Observed Responses  
---------------------------------------------------------------
% Concordant          0.7349          Somers' D        0.4697   
% Discordant          0.2651          Gamma            0.4697   
% Tied                0.0000          Tau-a            0.2320   
Pairs                5585188          c                0.7349   
---------------------------------------------------------------

Now we view the report of the model with the code chunk below.

report(model2)
We fitted a logistic model (estimated using ML) to predict status with
distance_to_tertiary_road (formula: status ~ distance_to_tertiary_road +
distance_to_city + distance_to_town + is_urban + usage_capacity +
water_source_clean + water_point_population + local_population_1km). The
model's explanatory power is moderate (Tjur's R2 = 0.16). The model's
intercept, corresponding to distance_to_tertiary_road = 0, is at 0.35 (95% CI
[0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_city (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_city = 0, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_town (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_town = 0, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with is_urban (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to is_urban = [?], is at 0.35 (95%
CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with usage_capacity (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to usage_capacity = 300, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_source_clean (formula: status
~ distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_source_clean = Borehole,
is at 0.35 (95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_point_population (formula:
status ~ distance_to_tertiary_road + distance_to_city + distance_to_town +
is_urban + usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_point_population = 0, is
at 0.35 (95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation. and We fitted a logistic
model (estimated using ML) to predict status with local_population_1km
(formula: status ~ distance_to_tertiary_road + distance_to_city +
distance_to_town + is_urban + usage_capacity + water_source_clean +
water_point_population + local_population_1km). The model's explanatory power
is moderate (Tjur's R2 = 0.16). The model's intercept, corresponding to
local_population_1km = 0, is at 0.35 (95% CI [0.15, 0.56], p < .001). Within
this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.

Confusion Matrix

The code chunk below generates the confusion matrix for the model.

blr_confusion_matrix(model2, cutoff = 0.5)
Confusion Matrix and Statistics 

          Reference
Prediction FALSE TRUE
         0  1300  743
         1   814 1899

                Accuracy : 0.6726 
     No Information Rate : 0.4445 

                   Kappa : 0.3348 

McNemars's Test P-Value  : 0.0761 

             Sensitivity : 0.7188 
             Specificity : 0.6149 
          Pos Pred Value : 0.7000 
          Neg Pred Value : 0.6363 
              Prevalence : 0.5555 
          Detection Rate : 0.3993 
    Detection Prevalence : 0.5704 
       Balanced Accuracy : 0.6669 
               Precision : 0.7000 
                  Recall : 0.7188 

        'Positive' Class : 1

The validity of a cut-off is measured using sensitivity, specificity and accuracy.

Sensitivity: The % of correctly classified events out of all events = TP/(TP + FN)

Specificity: The % of correctly classified non-events out of all non-events = TN/(TN + FP)

Accuracy: The % of correctly classified observation over all observations = (TP + TN)/(TP + FP + TN + FN)

The code chunk below creates a spatial data frame.

Osun_wp_sp <- Osun_wp_sf_clean %>%
  select(c(status, distance_to_tertiary_road, distance_to_city, distance_to_town, water_point_population, local_population_1km, is_urban, usage_capacity, water_source_clean)) %>%
  as_Spatial()
Osun_wp_sp
class       : SpatialPointsDataFrame 
features    : 4756 
extent      : 182502.4, 290751, 340054.1, 450905.3  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 9
names       : status, distance_to_tertiary_road, distance_to_city, distance_to_town, water_point_population, local_population_1km, is_urban, usage_capacity, water_source_clean 
min values  :      0,         0.017815121653488, 53.0461399623541, 30.0019777713073,                      0,                    0,        0,           1000,           Borehole 
max values  :      1,          10966.2705628969,  47934.343603562, 44020.6393368124,                  29697,                36118,        1,            300,   Protected Spring 

Fixed Bandwidth

The code chunk below will generate the fixed bandwidth for our spatial data frame.

bw.fixed <- bw.ggwr(status ~ distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban + usage_capacity + water_source_clean + water_point_population + local_population_1km, data = Osun_wp_sp, family = "binomial", approach = "AIC", kernel = "gaussian", adaptive = FALSE, longlat = FALSE)
Take a cup of tea and have a break, it will take a few minutes.
          -----A kind suggestion from GWmodel development group
 Iteration    Log-Likelihood:(With bandwidth:  95768.67 )
=========================
       0        -2890 
       1        -2837 
       2        -2830 
       3        -2829 
       4        -2829 
       5        -2829 
Fixed bandwidth: 95768.67 AICc value: 5681.18 
 Iteration    Log-Likelihood:(With bandwidth:  59200.13 )
=========================
       0        -2878 
       1        -2820 
       2        -2812 
       3        -2810 
       4        -2810 
       5        -2810 
Fixed bandwidth: 59200.13 AICc value: 5645.901 
 Iteration    Log-Likelihood:(With bandwidth:  36599.53 )
=========================
       0        -2854 
       1        -2790 
       2        -2777 
       3        -2774 
       4        -2774 
       5        -2774 
       6        -2774 
Fixed bandwidth: 36599.53 AICc value: 5585.354 
 Iteration    Log-Likelihood:(With bandwidth:  22631.59 )
=========================
       0        -2810 
       1        -2732 
       2        -2711 
       3        -2707 
       4        -2707 
       5        -2707 
       6        -2707 
Fixed bandwidth: 22631.59 AICc value: 5481.877 
 Iteration    Log-Likelihood:(With bandwidth:  13998.93 )
=========================
       0        -2732 
       1        -2635 
       2        -2604 
       3        -2597 
       4        -2596 
       5        -2596 
       6        -2596 
Fixed bandwidth: 13998.93 AICc value: 5333.718 
 Iteration    Log-Likelihood:(With bandwidth:  8663.649 )
=========================
       0        -2624 
       1        -2502 
       2        -2459 
       3        -2447 
       4        -2446 
       5        -2446 
       6        -2446 
       7        -2446 
Fixed bandwidth: 8663.649 AICc value: 5178.493 
 Iteration    Log-Likelihood:(With bandwidth:  5366.266 )
=========================
       0        -2478 
       1        -2319 
       2        -2250 
       3        -2225 
       4        -2219 
       5        -2219 
       6        -2220 
       7        -2220 
       8        -2220 
       9        -2220 
Fixed bandwidth: 5366.266 AICc value: 5022.016 
 Iteration    Log-Likelihood:(With bandwidth:  3328.371 )
=========================
       0        -2222 
       1        -2002 
       2        -1894 
       3        -1838 
       4        -1818 
       5        -1814 
       6        -1814 
Fixed bandwidth: 3328.371 AICc value: 4827.587 
 Iteration    Log-Likelihood:(With bandwidth:  2068.882 )
=========================
       0        -1837 
       1        -1528 
       2        -1357 
       3        -1261 
       4        -1222 
       5        -1222 
Fixed bandwidth: 2068.882 AICc value: 4772.046 
 Iteration    Log-Likelihood:(With bandwidth:  1290.476 )
=========================
       0        -1403 
       1        -1016 
       2       -807.3 
       3       -680.2 
       4       -680.2 
Fixed bandwidth: 1290.476 AICc value: 5809.719 
 Iteration    Log-Likelihood:(With bandwidth:  2549.964 )
=========================
       0        -2019 
       1        -1753 
       2        -1614 
       3        -1538 
       4        -1506 
       5        -1506 
Fixed bandwidth: 2549.964 AICc value: 4764.056 
 Iteration    Log-Likelihood:(With bandwidth:  2847.289 )
=========================
       0        -2108 
       1        -1862 
       2        -1736 
       3        -1670 
       4        -1644 
       5        -1644 
Fixed bandwidth: 2847.289 AICc value: 4791.834 
 Iteration    Log-Likelihood:(With bandwidth:  2366.207 )
=========================
       0        -1955 
       1        -1675 
       2        -1525 
       3        -1441 
       4        -1407 
       5        -1407 
Fixed bandwidth: 2366.207 AICc value: 4755.524 
 Iteration    Log-Likelihood:(With bandwidth:  2252.639 )
=========================
       0        -1913 
       1        -1623 
       2        -1465 
       3        -1376 
       4        -1341 
       5        -1341 
Fixed bandwidth: 2252.639 AICc value: 4759.188 
 Iteration    Log-Likelihood:(With bandwidth:  2436.396 )
=========================
       0        -1980 
       1        -1706 
       2        -1560 
       3        -1479 
       4        -1446 
       5        -1446 
Fixed bandwidth: 2436.396 AICc value: 4756.675 
 Iteration    Log-Likelihood:(With bandwidth:  2322.828 )
=========================
       0        -1940 
       1        -1656 
       2        -1503 
       3        -1417 
       4        -1382 
       5        -1382 
Fixed bandwidth: 2322.828 AICc value: 4756.471 
 Iteration    Log-Likelihood:(With bandwidth:  2393.017 )
=========================
       0        -1965 
       1        -1687 
       2        -1539 
       3        -1456 
       4        -1422 
       5        -1422 
Fixed bandwidth: 2393.017 AICc value: 4755.57 
 Iteration    Log-Likelihood:(With bandwidth:  2349.638 )
=========================
       0        -1949 
       1        -1668 
       2        -1517 
       3        -1432 
       4        -1398 
       5        -1398 
Fixed bandwidth: 2349.638 AICc value: 4755.753 
 Iteration    Log-Likelihood:(With bandwidth:  2376.448 )
=========================
       0        -1959 
       1        -1680 
       2        -1530 
       3        -1447 
       4        -1413 
       5        -1413 
Fixed bandwidth: 2376.448 AICc value: 4755.48 
 Iteration    Log-Likelihood:(With bandwidth:  2382.777 )
=========================
       0        -1961 
       1        -1683 
       2        -1534 
       3        -1450 
       4        -1416 
       5        -1416 
Fixed bandwidth: 2382.777 AICc value: 4755.491 
 Iteration    Log-Likelihood:(With bandwidth:  2372.536 )
=========================
       0        -1958 
       1        -1678 
       2        -1528 
       3        -1445 
       4        -1411 
       5        -1411 
Fixed bandwidth: 2372.536 AICc value: 4755.488 
 Iteration    Log-Likelihood:(With bandwidth:  2378.865 )
=========================
       0        -1960 
       1        -1681 
       2        -1532 
       3        -1448 
       4        -1414 
       5        -1414 
Fixed bandwidth: 2378.865 AICc value: 4755.481 
 Iteration    Log-Likelihood:(With bandwidth:  2374.954 )
=========================
       0        -1959 
       1        -1679 
       2        -1530 
       3        -1446 
       4        -1412 
       5        -1412 
Fixed bandwidth: 2374.954 AICc value: 4755.482 
 Iteration    Log-Likelihood:(With bandwidth:  2377.371 )
=========================
       0        -1959 
       1        -1680 
       2        -1531 
       3        -1447 
       4        -1413 
       5        -1413 
Fixed bandwidth: 2377.371 AICc value: 4755.48 
 Iteration    Log-Likelihood:(With bandwidth:  2377.942 )
=========================
       0        -1960 
       1        -1680 
       2        -1531 
       3        -1448 
       4        -1414 
       5        -1414 
Fixed bandwidth: 2377.942 AICc value: 4755.48 
 Iteration    Log-Likelihood:(With bandwidth:  2377.018 )
=========================
       0        -1959 
       1        -1680 
       2        -1531 
       3        -1447 
       4        -1413 
       5        -1413 
Fixed bandwidth: 2377.018 AICc value: 4755.48 
bw.fixed
[1] 2377.371

Geographically Weighted Logistic Regression (GWLR)

The code chunk below will construct the Geographically Weighted Logistic Regression using the fixed bandwidth that we just derived.

gwlr.fixed <- ggwr.basic(status ~ distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban + usage_capacity + water_source_clean + water_point_population + local_population_1km, data = Osun_wp_sp, bw = bw.fixed, family = "binomial", kernel = "gaussian", adaptive = FALSE, longlat = FALSE)
 Iteration    Log-Likelihood
=========================
       0        -1959 
       1        -1680 
       2        -1531 
       3        -1447 
       4        -1413 
       5        -1413 

Now we view the results of the Geographically Weighted Logistic Regression.

gwlr.fixed
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2022-12-18 01:39:05 
   Call:
   ggwr.basic(formula = status ~ distance_to_tertiary_road + distance_to_city + 
    distance_to_town + is_urban + usage_capacity + water_source_clean + 
    water_point_population + local_population_1km, data = Osun_wp_sp, 
    bw = bw.fixed, family = "binomial", kernel = "gaussian", 
    adaptive = FALSE, longlat = FALSE)

   Dependent (y) variable:  status
   Independent variables:  distance_to_tertiary_road distance_to_city distance_to_town is_urban usage_capacity water_source_clean water_point_population local_population_1km
   Number of data points: 4756
   Used family: binomial
   ***********************************************************************
   *              Results of Generalized linear Regression               *
   ***********************************************************************

Call:
NULL

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-129.368    -1.750     1.074     1.742    34.126  

Coefficients:
                                           Estimate Std. Error z value Pr(>|z|)
Intercept                                 3.540e-01  1.055e-01   3.354 0.000796
distance_to_tertiary_road                 1.001e-04  2.040e-05   4.910 9.13e-07
distance_to_city                         -1.764e-05  3.391e-06  -5.202 1.97e-07
distance_to_town                         -1.544e-05  2.825e-06  -5.466 4.60e-08
is_urbanTRUE                             -2.667e-01  7.474e-02  -3.569 0.000358
usage_capacity1000                       -6.206e-01  6.966e-02  -8.908  < 2e-16
water_source_cleanProtected Shallow Well  4.947e-01  8.496e-02   5.823 5.79e-09
water_source_cleanProtected Spring        1.279e+00  4.384e-01   2.917 0.003530
water_point_population                   -5.098e-04  4.476e-05 -11.390  < 2e-16
local_population_1km                      3.452e-04  1.779e-05  19.407  < 2e-16
                                            
Intercept                                ***
distance_to_tertiary_road                ***
distance_to_city                         ***
distance_to_town                         ***
is_urbanTRUE                             ***
usage_capacity1000                       ***
water_source_cleanProtected Shallow Well ***
water_source_cleanProtected Spring       ** 
water_point_population                   ***
local_population_1km                     ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6534.5  on 4755  degrees of freedom
Residual deviance: 5688.9  on 4746  degrees of freedom
AIC: 5708.9

Number of Fisher Scoring iterations: 5


 AICc:  5708.923
 Pseudo R-square value:  0.129406
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Fixed bandwidth: 2377.371 
   Regression points: the same locations as observations are used.
   Distance metric: A distance matrix is specified for this model calibration.

   ************Summary of Generalized GWR coefficient estimates:**********
                                                   Min.     1st Qu.      Median
   Intercept                                -3.7021e+02 -4.3797e+00  3.5590e+00
   distance_to_tertiary_road                -3.1622e-02 -4.5462e-04  9.1291e-05
   distance_to_city                         -5.4555e-02 -6.5623e-04 -1.3507e-04
   distance_to_town                         -8.6549e-03 -5.2754e-04 -1.6785e-04
   is_urbanTRUE                             -7.3554e+02 -3.4675e+00 -1.6596e+00
   usage_capacity1000                       -5.5889e+01 -1.0347e+00 -4.1960e-01
   water_source_cleanProtected.Shallow.Well -1.8842e+02 -4.7295e-01  6.2378e-01
   water_source_cleanProtected.Spring       -1.3630e+03 -5.3436e+00  2.7714e+00
   water_point_population                   -2.9696e-02 -2.2705e-03 -1.2277e-03
   local_population_1km                     -7.7730e-02  4.4281e-04  1.0548e-03
                                                3rd Qu.      Max.
   Intercept                                 1.3755e+01 2171.6373
   distance_to_tertiary_road                 6.3011e-04    0.0237
   distance_to_city                          1.5921e-04    0.0162
   distance_to_town                          2.4490e-04    0.0179
   is_urbanTRUE                              1.0554e+00  995.1840
   usage_capacity1000                        3.9113e-01    9.2449
   water_source_cleanProtected.Shallow.Well  1.9564e+00   66.8914
   water_source_cleanProtected.Spring        7.0805e+00  208.3749
   water_point_population                    4.5879e-04    0.0765
   local_population_1km                      1.8479e-03    0.0333
   ************************Diagnostic information*************************
   Number of data points: 4756 
   GW Deviance: 2815.659 
   AIC : 4418.776 
   AICc : 4744.213 
   Pseudo R-square value:  0.5691072 

   ***********************************************************************
   Program stops at: 2022-12-18 01:39:47 

To assess the performance of the gwLR, firstly, we will convert the SDF object in as data frame by using the code chunk below

gwr.fixed <- as.data.frame(gwlr.fixed$SDF)

Next, we will label yhat values greater or equal to 0.5 into 1 and else 0. The result of the logic comparison operation will be saved into a field called most.

gwr.fixed <- gwr.fixed %>%
  mutate(most = ifelse(gwr.fixed$yhat >= 0.5, T, F))
gwr.fixed$y <- as.factor(gwr.fixed$y)
gwr.fixed$most <- as.factor(gwr.fixed$most)
CM <- confusionMatrix(data = gwr.fixed$most, reference = gwr.fixed$y)
CM
Confusion Matrix and Statistics

          Reference
Prediction FALSE TRUE
     FALSE  1833  268
     TRUE    281 2374
                                          
               Accuracy : 0.8846          
                 95% CI : (0.8751, 0.8935)
    No Information Rate : 0.5555          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.7661          
                                          
 Mcnemar's Test P-Value : 0.6085          
                                          
            Sensitivity : 0.8671          
            Specificity : 0.8986          
         Pos Pred Value : 0.8724          
         Neg Pred Value : 0.8942          
             Prevalence : 0.4445          
         Detection Rate : 0.3854          
   Detection Prevalence : 0.4418          
      Balanced Accuracy : 0.8828          
                                          
       'Positive' Class : FALSE           
                                          

The code chunk below filters out the following columns from our sf data frame for binding with gwr.fixed.

Osun_wp_sf_selected <- Osun_wp_sf_clean %>%
  select(c(ADM2_EN, ADM2_PCODE, ADM1_EN, ADM1_PCODE, status))
gwr_sf.fixed <- cbind(Osun_wp_sf_selected, gwr.fixed)

Visualing coefficient estimates

The code chunk below is used to create an interactive point symbol map

tmap_mode("view")
tmap mode set to interactive viewing
prob_T <- tm_shape(Osun) +
  tm_polygons(alpha = 0.1) +
  tm_shape(gwr_sf.fixed) +
  tm_dots(col = "yhat", border.col = "gray60", border.lwd = 1) + tm_view(set.zoom.limits = c(8,14))
prob_T

Now we plot the tiertiary_TV and prob_T side by side.

tertiary_TV <- tm_shape(Osun) + tm_polygons(alpha = 0.1) + tm_shape(gwr_sf.fixed) + tm_dots(col = "distance_to_tertiary_road_TV", border.col = "gray60", border.lwd = 1) + tm_view(set.zoom.limits = c(8,14))

tmap_arrange(tertiary_TV, prob_T)
Variable(s) "distance_to_tertiary_road_TV" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Variable(s) "distance_to_tertiary_road_TV" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.